Melbourne Bicycle Network Routes and Road Safety: Part 1

Authored by: Bree McLennan and Anugra Sara Thomas

Duration: 120 mins 

Level: Intermediate  Pre-requisite Skills:  Python, Data engineering and analysis

Scenario

1. As a cyclist, I want a safe transport journey in the city of Melbourne. Which roads are safest to cycle on?

From a cyclist road users' perspective, I want a safe transport journey in Melbourne. I seek to understand which sections of bicycle road network are the safest and which have the highest occurence of accidents so that I can use this insight to plan the safest possible journey.

2. As a council, we seek to invest in road safety initiatives which reduce the occurrences of accidents resulting in serious injuries of citizens using our road network. Where are accident hotspots for cyclists occurring?

From a leadership & strategic perspective, as a council, we seek to invest in road safety initiatives which can effectively reduce the occurrences of accidents resulting in serious injuries and fatalities of citizens using our road network.

Where are accident hot spots for cyclists (and other road users) occurring?

Are our current approaches to road and bicycle network design working and having the impact we expected?

Objectives for the Exploratory Data Analysis

Using the power of data aggregation, we can combine Melbourne Open datasets such as transport networks and events With open government datasets including traffic accident ‘crash stats’ from Victoria Police and traffic event data from VicRoads and begin to observe, analyze and report on geographical patterns between these datasets.

We can ask questions such as:

  1. Are accidents which involve cyclists occurring on designated bicycle network paths or on different roads? Where, specifically?
  2. What are the traffic accident circumstances? Are cyclists colliding with open car doors an increasing or decreasing problem?
  3. Do accidents co-occur with road-works, weather events or other disruptive events?

Goals for exploratory data analysis:

  1. Analyse the frequency, timing and characteristics of traffic accidents involving cyclists in Melbourne
  2. Analyse the geographical location of these accidents and compare to the locations of the bicycle path network
  3. Report on sections of roadway where higher accident rates are observed

Strategic benefits for City of Melbourne

This use case and exploratory data analysis project can support the City of Melbourne in the following ways:

  1. Support for the ‘Safety and Well-being’ strategic vision and goals

  2. Influence the creation of a ‘key risk indicator’ to monitor progress on the reduction of the 'Number of transport-related injuries and fatalities’ on Melbourne roads

  3. Support further discussion between City of Melbourne and Victorian Road Safety partner agencies to improve road network design and infrastructure programs

Background on Related Road Safety Initiatives in Victoria

=======TODO: ADD IN CURRENT VICTORIAN ROAD SAFETY STATS==========

The approach to aggregating key data sources and analysing geographical attributes is currently used by the TAC (Transport Accident Commission) in Victoria when analysing accident hot-spots and reviewing whether the design of the road could be improved to reduce road trauma.

This type of analysis was used by TAC in recent years to assess fatal accident hotspots in Geelong.

The TAC in partnership with the Victorian Road Safety parntering agencies discovered a cluster of fatal accidents occurring over a 5-year period along a specific stretch of road at Thompsons Road, North Geelong.

The analysis informed a strategic decision for road safety partners (Victoria Police, VicRoads, City of Greater Geelong, TAC) to re-design the road to make it safer.

The road re-design has resulted in a substantial reduction in road trauma along Thompsons Road in North Geelong.

A similar analysis technique and approach could be applied to the City of Melbourne road network

REFERENCE:

[1] https://regionalroads.vic.gov.au/map/barwon-south-west-improvements/thompson-road-safety-improvements

Exploratory Data Analysis Worflow Steps

  1. Document the data considerations and risk assessments

  2. Prepare the Traffic Accident 'crash-stats' source data (this is handled by a separate python notebook)

  3. Access and read-in the Melbourne Bicycle Network dataset via the SOCRATA API

  4. Explore the Melbourne Bicycle Newtwork dataset as a geoJSON file

  5. Read-in the pre-processed Traffic Accident 'crash-stats' dataset

  6. Explore the Traffic Accident 'crash-stats' dataset

  7. Visualise the geographical features of the Melbourne Bicycle Network overlayed with Traffic Accident 'crash-stats' dataset

Dataset list:

  1. Melbourne Open Data Bicycle-Network
  2. VicRoads Accident data

 Data Considerations

1. Information Security and Sensitivity

For the purpose of analysis, the analysis datasets contain de-identified data. No personally identifyable names or contact details are used or included.

2. Converting raw traffic accident 'crash-stats' data into useful dataset

After initial observation of the traffic accident data in its raw form, the raw data was prepared and converted into a working ‘.csv’ file and imported into this notebook for further analysis.

The following process was used for converting the raw data into a working dataset:

  1. The accident context domains 'person', 'accident' and 'node' were used to form the foundation of the working dataset

    • From the domain 'person', the variable 'road user type' was used to identify 'Bicyclists'
    • From the domain 'accident', the variable 'accidentdate' was used to filter for accident records on and after '1st Jan 2017'
    • From the domain 'node', the variable 'lga_name' was used to identify the broad geographical area of 'MELBOURNE' city
  1. A series of two inner merges were then performed to construct the working dataset

    • First inner merge on the filtered 'accident' domain and filtered 'person' domain using the primary key 'accident_no'
    • Second inner merge on the output from the step described above, and on the filtered 'node' domain, using 'accident_no'
  1. To obtain additional traffic accident descriptive features, five additional data domains were left joined in sequence

    • 'surface conditions' using the primary key 'accident_no' to join
    • 'SUB DCA' using the primary key 'accident_no' to join
    • 'Atmospheric Conditions' using the primary key 'accident_no' to join
    • 'Accident Location' using the primary key 'accident_no' to join
    • 'Accident Event' using the primary key 'accident_no' to join
  2. Variable naming conventions were applied

Variable features which were created in the working dataset use a three-letter acronym prefix to denote the expected general data type values:

A suffix beginning with an underscore was also used to denote the context data domain origin for each feature. For example "_person" denotes a variable which originated from the accident person domain dataset.

Manual data inspection notes:

After creating the working dataset, data opportunities were discovered to create new variables to assist with the analysis:

3. Data cleaning & pre processing

Excess text whitespace characters were detected in variables 'TIMAccidentTime_accident' and 'CATDCADesc_accident', these were removed.

4. Geographical Location Data

In order to answer queries on geographical locations for accidents, the analysis dataset requires longitude and latitude data in order to instruct geographical mapping tools and visualisations. The longitude and latitude data is captured when accident records are entered into the source system.

5. Additional Data

None identified.

6. Data Integrity Checks and Filtering

 Setting Up for Analysis

To begin the analysis we first import the necessary libraries to support our exploratory data analysis using Melbourne Open data.

The following are core packages required for this exercise:

To connect to the Melbourne Open Data Portal we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token which can be requested from the City of Melbourne Open Data portal by registering here

For this exercise we will access the domain without an application token.

 Exploring the Melbourne Bicycle Network Dataset

Next, we will look at the Bicycle-Network dataset, to better understand its structure and how we can use it.

Our data requirements from this use case include the following:

For this exercise, we start by examining the Bicycle-Network dataset. Each dataset in the Melbourne Open Data Portal has a unique identifier which can be used to retrieve the dataset using the sodapy library.

The Bicycle-Network dataset unique identifier is '3bxw-t8fk'. We will pass this identifier into the sodapy command below to retrieve this data.

This dataset is placed in a Pandas dataframe and we will inspect the metadata.

Working with the Melbourne Bicycle Network Routes Dataset as a geoJSON file

The code below describes how to access the Bicycle Network dataset as a geoJSON file through the SOCRATA API.

Accessing the first record in the geoJSON file

To observe the type of data and values stored within the geoJSON file we can use the following code to observe the first record.

Observing the geoJSON Full Structure and Properties

By calling the variable 'bicyclepath' we can observe the full structure, properties and values of the geoJSON file.

Navigating the geoJSON File Structure

When you load a GeoJSON file using the json library, you get a dictionary that contains an entry 'features', which contains the list of features. Each feature in turn consists of a dictionary, which, contains an entry 'geometry'.

The geometry is a dictionary containing the entries 'type' and 'coordinates'.

The GeoJSON file can be traversed or navigated using the following code:

Visualising the Melbourne Bicycle Network on a Map

To visualise the geoJSON file containing the Melbourne Bicycle Network we can use the 'folium' and 'json' packages and the following code.

Exploring the Traffic Accident 'Crash-Stats' Dataset

This section focuses on setting up the Traffic Accident 'Crash-Stats' dataset and preparing it for use in the exploratory data analysis alongside the Melbourne Bicycle Network dataset.

The raw input dataset contains the following structure:

Setting up the Working Accident 'Crash-Stats' Dataset

The working dataset will have the following structure.

Inspecting the value sets for each variable in the working dataset

Here we will broadly check the value sets for each variable. The information from this check will inform what types of values to expect for each column and cultivate thinking on what values constitute missing or invalid entries and how to deal with this situation.

Inspecting 'Accidents Per Year (All-Time)'

In this section we will explore and observe how many bicycle accidents have occurred each year.

Important Note: The year 2020 is under-developed as an accident year as the last record date in the dataset is March 2020.

We will use 'seaborn' and 'matplotlib' libraries for visualisations.

Inspecting 'Accidents Per Hour (All-Time)'

In this section we will explore and observe how many bicycle accidents have occurred by accident hour.

We will use 'seaborn' and 'matplotlib' libraries for visualisations.

Inspecting 'Accidents per Weekday (All-Time)'

In this section we will explore and observe how many bicycle accidents have occurred by weekday.

We will use 'seaborn' and 'matplotlib' libraries for visualisations.

Inspecting 'Accidents per Day (All-Days, Detailed)'

In this section we will explore and observe how many bicycle accidents have occurred each day since the earliest date recorded in the dataset.

We will use 'seaborn' and 'matplotlib' libraries for visualisations.

Inspecting 'Accidents per Weekday and Hour grouping (All-Time)'

In this section we will explore and observe how many bicycle accidents have occurred each weekday and hour grouping.

We will use 'seaborn' and 'matplotlib' libraries for visualisations.

Inspecting the 'Geography of Bicycle Accident Occurrences'

In this section we will explore and observe the frequency of bicycle accidents by geographical locations.

We will use 'seaborn' and 'matplotlib' libraries for visualisations.

Visualising the Bicycle Accident Data on a Map

Creating the first map visual to observe where bicycle accidents are occurring

Visualising the Melbourne Bicycle Route Network and Bicycle Accident Data

Creating an Alternative Map Visual to Distinguish Accidents by Year and Injury Type

To observe the Melbourne Bicycle Route geoJSON map overlayed with the Bicycle Accident Data we can use the following code.

  Findings and Opportunities

This analysis has provided a comprehensive starting point for inspecting the Melbourne Open Data Bicycle Network dataset and Traffic Accidents (Bicyclists) data.

We achieved in this analysis:

We learned from this analysis:

Observations for further opportunities

  References

[1] Thompson Road North Geelong Road Safety Improvements https://regionalroads.vic.gov.au/map/barwon-south-west-improvements/thompson-road-safety-improvements

[2] Victorian 'Crash-Stat's dataset https://discover.data.vic.gov.au/dataset/crash-stats-data-extract/resource/392b88c0-f010-491f-ac92-531c293de2e9

[3] Bicycle Routes Dataset https://data.melbourne.vic.gov.au/Transport/Bicycle-routes-including-informal-on-road-and-off-/24aw-nd3i![image-3.png](attachment:image-3.png)

Technical References

[4] Accessing geoJSON data https://stackoverflow.com/questions/48263802/finding-location-using-geojson-file-using-python

[5] Accessing geoJSON data https://medium.com/analytics-vidhya/measure-driving-distance-time-and-plot-routes-between-two-geographical-locations-using-python-39995dfea7e

[6] Visualising a geoJSON dataset https://python-visualization.github.io/folium/quickstart.html#GeoJSON/TopoJSON-Overlays

[7] Visualising categorised data on a map https://www.geeksforgeeks.org/python-adding-markers-to-volcano-locations-using-folium-package/

[8] Creating point plot group layers with folium https://towardsdatascience.com/creating-an-interactive-map-of-wildfire-data-using-folium-in-python-7d6373b6334a

[9] Ideas for further opportunities - Time Series Analysis https://geohackweek.github.io/ghw2018_web_portal_inlandwater_co2/InteractiveTimeSeries.html